Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization
نویسندگان
چکیده
We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent. Our approach – Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say logits for ‘dog’ or even a caption), flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multi-modal inputs (e.g. VQA) or reinforcement learning, without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) are robust to adversarial images, (c) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (d) are more faithful to the underlying model, and (e) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show even non-attention based models can localize inputs. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that GradCAM helps untrained users successfully discern a ‘stronger’ deep network from a ‘weaker’ one. Our code is available at https://github.com/ramprs/grad-cam/ and a demo is available on CloudCV [2]1. Video of the demo can be found at youtu.be/COjUB9Izk6E. 1http://gradcam.cloudcv.org
منابع مشابه
Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks
Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision based problems. However, deep models are perceived as ”black box” methods considering the lack of understanding of their internal functioning. There has been a significant recent interest to develop explainable deep learning models, and this paper is an effort in this direction....
متن کاملGrad-CAM: Why did you say that?
We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are ‘important’ for predictions – producing visual explanations. Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM), uses class-specific gradient information to localize important regions. These localizations are combined with existing pixe...
متن کاملDream Formulations and Deep Neural Networks: Humanistic Themes in the Iconology of the Machine-Learned Image
This paper addresses the interpretability of deep learning-enabled image recognition processes in computer vision science in relation to theories in art history and cognitive psychology on the vision-related perceptual capabilities of humans. Examination of what is determinable about the machine-learned image in comparison to humanistic theories of visual perception, particularly in regard to a...
متن کاملVisual Explanations from Hadamard Product in Multimodal Deep Networks
The visual explanation of learned representation of models helps to understand the fundamentals of learning. The attentional models of previous works used to visualize the attended regions over an image or text using their learned weights to confirm their intended mechanism. Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function ...
متن کاملDeep Adaptive Networks for Visual Data Classification
This paper proposes a classifier called deep adaptive networks (DAN) based on deep belief networks (DBN) for visual data classification. First, we construct a directed deep belief nets by using a set of Restricted Boltzmann Machines (RBM) and a Gaussian RBM via greedy and layerwise unsupervised learning. Then, we refine the parameter space of the deep architecture to adapt the classification re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1610.02391 شماره
صفحات -
تاریخ انتشار 2016